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We derive an inequality relating the entropy difference between two quantum states to their trace norm dis- 
tance, sharpening a well-known inequality due to M. Fannes. In our inequality, equality can be attained for 
every prescribed value of the trace norm distance. 

PACS numbers: 03.65.Hk 
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I. INTRODUCTION 

The initial motivation of the present paper was given in 
by a purely pedagogical issue: given the ubiquity of power- 
ful computers on nearly every desk fl], one should be able 
to quickly illustrate (rather than prove) the validity of many 
basic inequalities. In quantum mechanics, and in Quantum 
Information Theory in particular, perhaps the best known in- 
equality is the eponymous continuity inequality © for the von 
Neumann entropy, discovered by M. Fannes. This inequality 
gives an upper bound on the absolute value of the difference 
between the von Neumann entropies of two finite-dimensional 
quantum states, in terms of their trace norm distance (FJJ. 

The inequality can easily be illustrated using a computer, as 
it deals with finite-dimensional quantum states and each of its 
constituents can be calculated efficiently. What one has to do 
is to generate random pairs of states, calculate both the trace 
norm distance and the absolute value of the difference of their 
von Neumann entropies, and produce a scatter plot of these 
two quantities. Adding to that a graph of the upper bound, 
one should see a cloud of points lying below the latter graph. 
Indeed, only a few minutes of work is required to produce 
plots akin to those of Figures 1, 2 and 3 0]. 

Now one directly sees that the bound is indeed an upper 
bound, but also that the bound is not sharp. There are no points 
on the graph, or even near it. Although this is certainly not a 
problem for the originally intended use of the bound - proving 
a continuity property of the von Neumann entropy - neverthe- 
less, like the present author, one could be compelled to find a 
better bound; a sharp bound, that exactly describes the upper 
boundary of the cloud of randomly generated points. 

In 1 6], the author, together with J. Eisert, did exactly this 
for the relative entropy, which is in a sense a quantity derived 
from the von Neumann entropy. In the present paper, the same 
is done for the von Neumann entropy itself. The present pa- 
per could therefore be considered the 'prequel' of |6]. The 
outcome is a new, sharp bound, of the same type as Fannes' 
one, and, rather surprisingly, of the same complexity. 

As mentioned, there are no real benefits in the new bound 
w.r.t. proving continuity of the von Neumann entropy. How- 
ever, in recent times, new usage of such a bound has been 
found, e.g. in entanglement theory. For this modern usage our 



bound has the important benefit that it is actually easier to use, 
because it is valid over the whole range of possible values of 
the trace norm distance, unlike Fannes' one, which only holds 
for trace norm distances less than 1/e and has to be modified 
for larger ones. Furthermore, it is the sharpest bound possible 
and improves on the older one. The only added cost of the 
new bound goes in its proof, which is much longer. 

Before stating the main result, let us first introduce some 
notations. The acronyms LHS and RHS are short for left- 
hand side and right-hand side. To denote Hermitian conju- 
gate, we follow mathematical conventions and use the asterisk 
rather than the dagger. The notation Diag(a;, y,z . . .) denotes 
the diagonal matrix with diagonal elements x, y,z, . . ., and 
Eig-^ (A) denotes the vector of eigenvalues of a Hermitian ma- 
trix A, sorted in non-increasing order. Following information- 
theoretical convention, we use base-2 logarithms, denoted by 
log 2 . The natural logarithm will be denoted by In. The von 
Neumann (vN) entropy, when expressed in units of qubits, is 
then defined as 



S(p) :=-Tr[plog 2 p]. 



(1) 



For classical probability distributions, this reduces to the 
Shannon entropy 



(2) 



where p is a probability vector. We will occasionally indulge 
in overloaded usage of the symbol H and define H(x) := 
—x log 2 x for non-negative scalars x. Thus the relation 
H (p) = Ei H (p,-) holds. 

We use the following definition for trace norm distance: 



rfftir) = \\p- er||i/2, 
including the factor 1 /2 to have T between and 1 . 



(3) 



The original inequality for the continuity of the vN entropy, 
as proven by Fannes |j,|5|], reads: 



\S{p) - S(a)\ < 2Tlog 2 (d) - 2Tlog 2 (2T), 



(4) 



which is valid for < T < l/2e. For larger T one can use 
the weaker inequality 



\S(p) - S(a)\ < 2Tlog 2 (d) + l/(eln(2)). 



(5) 
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Our main result is a sharpening of these inequalities: 



FIG. 1: Scatter plot of 20000 randomly generated pairs (p, a) of 
qubit states (d = 2); shown is the trace norm distance T = \\p — 
cr||i/2 versus the difference A = \S(p) — S(a)\ of the vN entropies. 
The upper curve in the interval < T < l/(2e) represents the 
Fannes bound 0. The lower curve represents our sharp bound 
and is seen to follow the boundary of the set of scatter points tightly. 

Theorem 1 For all d- dimensional states p, a such that their 
trace norm distance is given by T, 

\S{ P ) - S(a)\ < T\og 2 (d - 1) + H((T, 1 - Tj). (6) 

In fact, by construction of this bound, there is no sharper 
bound than this one that exploits knowledge of T and d only. 

To show that sharpness holds for any value of T and d, 
we just note that the following pair of (commuting) states 
achieves the bound: 



FIG. 2: Same as Fig. 1, but for qutrits (d — 3). 




p = Diag(l-T,T/(d-l),. 
a = Diag(l,0,...,0). 



.,T/(d-l)) 



(7) 
(8) 



FIG. 3: Same as Fig. 1, 
id = 4). 



but for 4-dimensional quantum systems 



In other notations: 



|0)(0| 

Td U 
d-l d 



Td 
d-l 



|0)(0| 



(9) 
(10) 



Note that the coefficient of |0)(0| in p may be negative. A 
simple calculation then yields that their trace norm distance is 
T, and their entropy difference is T \og 2 (d -1)+H((T,1- 
T)). We once again stress that Fannes' original bound is not 
sharp: there are no pairs of states saturating Fannes' bound 
except in the trivial case when they are identical (T = 0). 



II. PROOF 

The remainder of this paper will be devoted to the proof of 
our inequality. Because of its complexity, we will proceed in 
several stages. 



A. Reduction to classical case 

The first step of the proof is to reduce the statement to the 
commuting (classical) case. Since S is unitarily invariant, 
S(p) only depends on the eigenvalues of p. Let us denote the 
eigenvalue decompositions of p and a by p — V Diag(A p ) V* 
and a = W Diag(Ao-)W*; here, A p = Eig^p). The LHS of 
then becomes \H(A P ) — H(A a )\, and the trace norm dis- 
tance, which is the only ingredient of the RHS that depends on 
the states, is given by T = \\ Diag(A p )-Z7 Diag(A CT )£/*||i/2, 
where U = V*W. 

Let us now fix the eigenvalues of p and er; the only degree 
of freedom is then in the unitary matrix U, which only ap- 
pears in the RHS. The LHS is thus fixed, while the RHS can 
be varied. Referring to the Figures, this amounts to looking at 
cross-sections of the plot along the horizontal lines. To prove 
correctness of the bound (|6jl we have to look at the points of 
minimal (leftmost) and maximal (rightmost) trace norm dis- 
tance. Inequality IV.62 in |2], which essentially seems to be 
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due to Mirsky |4], reads: 

IHEig^-Eig^B)!!! < \\\A-B\\\ 

\\\A-B\\\ < |||EigJ-(A)-EigT(S)|||, 

for all Hermitian A and B and all unitarily invariant norms. 
In particular, we get that the extremal values of T = 
|| Diag(Ap) — [/Diag(A CT )[/*||i/2, when varying U, are ob- 
tained for U equal to certain permutation matrices. More pre- 
cisely, the minimal value is obtained for U = t, and the max- 
imal value for U the permutation matrix that totally reverses 
the diagonal entries. 

This shows, in particular, that the boundary of the "point 
cloud" can be found for diagonal p and a, i.e. for commuting 
states. 



B. Proof Strategy 

In the following we can therefore restrict to the commuting 
case and only look at (discrete) probability distributions and 
their Shannon entropies. To highlight the classical nature of 
the remainder of the proof, we will replace the states p and 
a by d-dimensional probability vectors p and q. We have to 
show that the following inequality holds: 

\H(p)-H(q)\<Tlog 2 (d-l)+H((T,l-T)), (11) 

where T is now 

d 

T:=(l/2)Y,\pi-<li\- (12) 
i=i 

We will do this in a constructive way, by fixing T and looking 
for pairs p, q that maximise the LHS. The maximal value of 
the LHS thus obtained then will be a sharp upper bound by 
construction. 

At this point it is interesting to mention that simple things 
don't work. For example, it is not obvious that \H (p) — H (q) \ 
should be maximal for p "pure", because this quantity is 
neither convex nor concave, and furthermore is to be max- 
imised over the rather complicated set of all (p, q) such that 

(1/2) Eti \Pi =T, Pi > 0, q % > and = 

£ i(?i = lliold. 

Let us introduce the symbol <5 := p — q. Since p and q 
are probability vectors, the Si are real numbers adding up to 0. 
We can decompose S in a positive and negative part, which we 
denote by S + and 5~ . Thus we have 5 = S + — 5~ . Both parts 
consist of non-negative reals and their elementwise product 
8f5~ is 0. The constraint d!2i then translates to ^ 5f = T 
and V <), T. 

In the following, we will shift attention to the quantity 
H(q) — H(p) (without taking absolute values) and try to find 
its global minimum. Subsequently taking the absolute value 
then yields the maximum of \H (q) — H(p)\. 



C. The case d = 2 

When d = 2, we automatically get that 6 must be given by 
S = (+T, —T). The quantity to be minimised is then 

H(q) - H(p) = H(( Pl + T,l —pi—T)) — H(( Pll 1 - Pl )), 

where p\ is the first entry of p. As this quantity is obtained by 
setting d = 2 in i24\ below, we need not spend more time on 
this special case. The reader is advised to proceed to the end 
of subsection G and thereby collect a free parking token 1 8]. 



D. Optimal S+ 

We will prove here that the optimal S + is "rank 1"; that is, it 
has just one non-zero entry, which then is given by T, W.l.o.g., 
since nothing has been claimed yet about p or q themselves, 
we can put this non-zero entry on the first position. Further- 
more, S~ can then take non-zero values on all positions except 
the first one. 

Letting p\ be the first entry of p, p and q must then be of 
the form 

p = ( Pl ,(l- Pl )r) (13) 
q = (pi+T,(l-^)r-Ts), (14) 

where r and s are (d — l)-dimensional probability vectors, 
with the restrictions 

Pi+T < 1 (15) 
(l-pi)r-Ts > 0. (16) 

Here, Ts is just 5~ . The value of H (q) ~ H (p) corresponding 
to this is given by 

H(q)-H(p) = H(p 1+ T)-H( Pl ) 
+H((l- Pl )r-Ts) 
-H((l- Pl )r). (17) 

The remaining minimisation over r, s and p\ will be per- 
formed in the subsequent stages. 

Proof. Let us now prove that the optimal 5 + must indeed 
be rank 1. So we put q = p + 6 + — 6~ and fix p and S~, 
under the restrictions p — 5~ > 0. The restrictions on S + 
are, as mentioned before, 8 + > 0, 5 + 8~ = 0, and — 
T. Hence, S + is restricted to a convex set. If S~ is on the 
positions 1 to k, say, then the extremal points of this convex 
set are given by Te 1 , Te 2 , . . . , Te k . Now the optimality of 
one of these extremal points follows because H(q) — H(p) = 
H(p+5 + — 5~) — H(p) is concave in S + (since H is concave), 
and it is well-known that concave functions reach their global 
minimum over a convex set in one (or more) of the extremal 
points of that set. □ 
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E. Optimal (1 - pi)r - Ts 



and the non-linear term 



Next, we minimise H(q) — H(p) over r and s, which are 
general (d — l)-dimensional probability vectors. By d!7l . we 
have to minimise H ((1 — pi)r — Ts) — H ((1 — pi)r). The 
only extra condition on r and s is (1 — pi)r — Ts > 0. We 
will show that minimality is achieved when (1 — pi)r — Ts is 
rank 1 . 

Proof. Given that r and s are probability vectors and that 
the condition (1 — p\)r — Ts > is satisfied, ((1 — p\)r — 
Ts)/(1 — pi — T) is also a probability vector, which we will 
denote by 77. Thus (1— pi)r— Ts = (l—pi—T)r). Conversely, 
for any pair of probability vectors s and 77, r' := ((1 — p\ — 
T)r] + Ts)/(1 — pi) is a probability vector satisfying (1 — 
Pi)r' — Ts > 0. Therefore, we can do the substitution (1 — 
Pi)r — Ts = (1 — pi — T)rj and forget about r altogether. 
Thus we are down to minimising 

H((l - Pl - T)rj) - H((l - Pl - T) V + Ts) 

over all probability vectors r\ and s. 

Now note that for all x, y > 0, H(x) — H(x + y) is concave 
and monotonously increasing in x. Indeed, the first deriva- 
tive w.r.t. x is log(l + y/x) > and the second derivative is 
—y/(x(x + y)) < 0. Thus, as in the previous stage, we can 
conclude that H((l -pi - T)rj) - H((l - Pl - T)i] + Ts) is 
minimal for an extremal 77. Since we haven't yet decided on 
s, we will put w.l.o.g. r\ = e . □ 



With this optimal value for rj, and putting 

s = (si, (1 - sx)<p) 

(with cf>a (d — 2) -dimensional probability vector), we get 

H((l - Pl - T) v ) - H((l - Pl - T)rt + Ts) 
= H(l- Pl -T)-H(l- Pl -T(l- Sl )) 
-H(T(1- Sl )4>) 

The remaining minimisation over s now consists of first min- 
imising over (j), and then over s\. 

The minimisation over (f> is easy, because it only involves 
the term H(T(1 — si)(f>), without any constraint other than 
that be a probability vector. This term achieves its maximum 
when (f> = (1, 1, . . . , l)/(d— 2), the uniform distribution, and 
the maximum value is T(l - s{) log 2 (d - 2) + H(T(1 - si)). 

We are now left with a minimisation over si of the function 

H(l —px—T) — H(l - pi - T(l - si)) 
-T(l - Sl ) log 2 (d - 2) - H(T(1 - Sl )). (18) 

We will tackle this minimisation in the next stage. 

F. Optimal sj 

In terms of si, (II 81 is the sum of a linear term, 

ff(l-pi-T)-T(l-«i)Iog 2 ((Z-2), 



-H(l - Pl - T(l - Sl )) - H(T(l - 8l )). 

This term is of the form —H(y — x) — H(x), with < x < y, 
and is therefore convex in s\. The only constraint on s± is that 
it be in the interval [0,1]. 

We therefore have find the local minimum of i ll 8k by con- 
vexity of the function, we are guaranteed there is only one. If 
this minimum is inside the feasible interval < si < 1, then 
this gives the answer; if it is outside it, then the minimum of 
the constrained minimisation is either or 1, depending on the 
location of the local minimum. 

The derivative of dl 81 w.r.t. si is 



T(log 2 (d-2) + log 2 (l 
-log 2 (T(l- Sl ))). 



■pi-T(l-ai)) 



For T > 0, this is when 



(d-2)(l-p 1 -T(l-« 1 ))=T(l-« 1 ), 
that is, when 

(d-2)(l-pi) 



T(l- ai ) 



1 



Recall that from the restriction p x < 1— T follows T < 1— pi. 
As the LHS lies between and T, we have to consider two 
cases. 

Case (i) - If0<T<(d- 2)(1 - Pi)/(d - 1), the local 
optimum cannot be achieved, and we have to take the nearest 
point, which is where T(\ — s\) — T, i.e. si — 0. Then the 
minimum of Jl 8i is given by 



T\og 2 (d-2)-H(T). 



(19) 



Case (ii) -If (d- 2)(1 - pi)/(d - 1) < T < 1 - pi, the 
local optimum is a feasible point, and we can put T(l — s\) — 
(d — 2)(1 — pi)/ (d — 1). For the minimum of dl 81 this gives 

H{l- Pl -T)-H{ 1 —^) 



(d-2)(l-p 1 ) 

d-1 



d-l 
log 2 (d-2) 



(d-2Kl_pi) 



(20) 



G. Optimal pi 

For the final step of the procedure, we have to find the pi 
that minimises the complete expression of the minimum of 
H (q) — H (p) that we have found so far, under the restriction 

< Pi < 1 — T. We have to consider the two cases from the 
previous stage. 

Case(i)-lfT < (d- 2)(1 -pi)/(d- 1), that is, < pi < 

1 — (d — l)T/ (d — 2), we need to minimise 



H( Pl +T) — H( Pl ) - Tlog 2 (d - 2) - H(T). 



(21) 
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This case only occurs when T < (d — 2)/(d — 1). By a 
previously obtained result, the function x i— > H(x+y)—H(x) 
is monotonously decreasing in x (and convex). Its minimum 
therefore occurs for the largest possible value of pi, which in 
this case is p\ = 1 — (d — 1)T/ (d — 2). This gives as minimal 
value 

H(l - T/(d - 2)) - H{\ - (d - l)T/(d - 2)) 
-Tlog 2 (d-2) -H{T). (22) 

Case (ii)-lf (d - 2)(1 - Pi)/(d - 1) < T < 1 -pi, that 
is, 1 — (d — 1)T/ (d— 2) < pi < 1 — T, we need to minimise 

H( Pl +T)-H( Pl ) 
+H{l-p 1 -T)-H{ 1 - r ^-) 



(d-2)(l- Pl ) 



d-1 

(d-2){l- Pl ) 



log 2 (d-2) 



(23) 



The derivative of fl23i w.r.t. p! equals the logarithm of 

(d-i)pi(i-pi-r) 

(1-P!)(pi+T) 

This expression obviously decreases with T, and for the min- 
imal allowed value T = (d — 2)(1 — pi)/ (d — 1) it is given 
by 

(d-2)(l- Pl ) 
d-2+pi ' 

which is easily seen to be below 1 ; its logarithm is therefore 
negative. Consequentially, the derivative of i23\ is negative 
over the range under consideration. We conclude that A23I is 
minimal for the maximal allowed pi, which is pi = 1 — T. 



This gives as minimal value for H(q) — H(p) 

H(l)-H(\-T) + H{0)-H(^—) 

which simplifies to 

-(T\og 2 (d-l) + H(T) + H(l-T)). (24) 



The final step is now to take the minimum of the two 
cases J22I and i24\ . the former one only being valid for 
T < (d— 2)/{d - 1). From the fact that H(x + y) - H{x) 
is monotonously decreasing in x one deduces the relation 
H(l-a)+H(l-b) > ff(l-a-6),for0 < a,6anda+6 < 1. 
The terms H (l-T/(d-2))-if(l-(d-l)T/(d-2))in@ 
are therefore larger than the term H(l — T) in J24I . Further- 
more, — Tlog 2 (d — 2) is larger than — Tlog 2 (d — 1). Hence, 
i24l is always smaller than J22b . 

Taking absolute values and noting that J24i is always nega- 
tive then finally yields inequality il Q . □ 
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